Now my thoughts about modifying the ALU to do ARR.
I'm leaving details about the flag evaluation for later...

ARR in binary mode is simple:

;===========================================================================
;
; ARR binary mode:
;
;         ---       ---
;  A-----|   |tmp1 |   |tmp2
;        | & |-----|ROR|------*---> Result
;  #imm--|   |     |   |      |
;         ---       ---        ---> Flag evaluation
;                    ^
;                    |
;                   C_Flag
;
;  tmp1 := A & #imm
;  tmp2 := "ROR" tmp1 //C_Flag goes into tmp2.7
;
;  //Result = tmp2
;
;  N_Flag := tmp2.7 // = ROR input carry
;  Z_Flag := set if tmp2=$00
;  C_Flag := tmp2.6
;  V_Flag := tmp2.6 XOR tmp2.5
;
;
;  It isn't a 'real' ROR of course:
;  C_Flag goes into Bit 7, but it isn't modified by that 'shift right'.
;

The LUs would do A & #imm, and pass the result (tmp1 in this example)
through the ADDR adders.

The LSR.OUT 74541 then does a 'shift right' and places the result
on the W bus.

Flag evaluation taps into the W bus, and that's all.

(I'm not sure about the microcode control signals, of course.)

;---

The problem is ARR in decimal mode:

;===========================================================================
;
; ARR decimal mode:
; 
;                     ---
;                    |   |  "correction"
;                   -|det|-----------
;                  | |   |           |
;                  |  ---            V
;         ---      |  ---           ---
;  A-----|   |tmp1 | |   |tmp2     |   |
;        | & |-----*-|ROR|------*--| + |-> Result
;  #imm--|   |       |   |      |  |   |
;         ---         ---       |   ---
;                      ^        |
;                      |         ---> Flag evaluation
;                     C_Flag
;
;  tmp1 := A & #imm
;  tmp2 := "ROR" tmp1 //C_Flag goes into tmp2.7
;
;  Now for those little boxes labeled 'det' and '+',
;  the decimal "correction":
;
;  if (tmp1 & $0F) >= $05
;  then Result.3..0 := tmp2.3..0 + $06
;  else Result.3..0 := tmp2.3..0 + $00
;
;  if (tmp1 & $F0) >= $50
;  then Result.7..4 := tmp2.7..4 + $60 ; C_Flag :=1; //nothing else would set C_Flag in ARR decimal mode !
;  else Result.7..4 := tmp2.7..4 + $00 ; C_Flag :=0;
;
;  N_Flag := tmp2.7 // = ROR input carry
;  Z_Flag := set if tmp2=$00
;  V_Flag := tmp2.6 XOR tmp2.5
;

The LUs would do A & #imm, and pass the result (tmp1 in this example)
through the ADDR adders.
 
BCD.DETECT.LO and BCD.DETECT.HI 74151s still are directly connected
to ADR7..0, the outputs of the ADDR adders, but this time they have
to trigger on 0x5..0xF instead of 0xA..0xF.

To make those 74151s trigger on 0x5..0xF,
the inputs D3..7 have to be logic low level,
the inputs D0..1 have to be logic high level (like in the current schematics),
but the input D2 has to be fed with:
/ADR.0 for BCD.DETECT.LO
/ADR.4 for BCD.DETECT.HI

This calls for two additional NAND gates, sorry.

The BCD.DETECT trick works like this: for instance BCD.DETECT.LO:
The 74151 select inputs A..C are tied to ADR3..1.
0x5, binary 0101, would give a 010 binary pattern at the select inputs.
So for 0x5, the 74151 would select the D2 input.
A logic low level on the D2 input would trigger the BCD detection,
so we need to have something like ADR0 NAND ARR.decimal at D2,
where ARR.decimal is a high active control signal which goes
active during ARR in decimal mode.

D3,D4 are fed by /ARR.decimal.
To tie D7..5 low, it looks like we need to have an additional AND gate 
feeding D7..5.
AND gate is fed with the output of the IC29A OR and /ARR.decimal.

For BCD.DETECT.HI, the game is similar (except that we are using ADR7..4).
The AND gate mentioned above also can feed D7..5 on BCD.DETECT.HI.

BTW: IC29A, IC22D: those two OR gates have identical input signals,
so you could throw one of those two gates out of the schematics. :)

Another problem is, that in the current schematics the carry generated by
BCD.DETECT.LO if it triggers goes into ADR.HI, and from my ARR test code,
it shouldn't.
So we need an additional AND gate in front of the BCDLC pin of ADR.H
to block it when /ARR.decimal is active.

Unfortunately, the 'shift right' has to happen before the "BCD correction",
and in our ALU the 'shift right' happens as the last step.
Because of the topology of the carry chain within the correction adders,
we can't resort to tricks like adding, for instance, 0xCC as a correction
and then to shift right, sorry. :)
This only can be done by placing two 74541s between the ADR outputs
ADR7..0 and the inputs of the BCD.ADJUST adders.

Those two 74541s would form a shifter similar to ALU.OUT and LSR.OUT,
that shifts right for ARR and passes through for all the other opcodes.

So in ARR decimal mode, the flag evaluation has to tap into the
outputs of that shifter.
For the NMOS 6502 in decimal modes, you could tap into those
shifter outputs, too.

;---

Cost for modifying the ALU to do binary and decimal ARR:

2 * 74541
2 * AND gate
2 * NAND gate

;-------

The alternative is to build "dedicated circuitry" that only does ARR
and feeds the W bus:

2 * 74541
2 * 74151
2 * 74283

But I think this won't simplify flag evaluation.

Cheers,
Dieter.



